Exponentiated Gradient versus Gradient Descent for Linear Predictors Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556

نویسندگان

  • Jyrki Kivinen
  • Manfred K. Warmuth
چکیده

We consider two algorithm for on-line prediction based on a linear model. The algorithms are the well-known gradient descent (GD) algorithm and a new algorithm, which we call EG. They both maintain a weight vector using simple updates. For the GD algorithm, the update is based on subtracting the gradient of the squared error made on a prediction. The EG algorithm uses the components of the gradient in the exponents of factors that are used in updating the weight vector multi-plicatively. We present worst-case loss bounds for EG and compare them to previously known bounds for the GD algorithm. The bounds suggest that the losses of the algorithms are in general incomparable, but EG has a much smaller loss if only few components of the input are relevant for the predictions. We have performed experiments, which show that our worst-case upper bounds are quite tight already on simple artiicial data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing the Maximum Bichromatic Discrepancy, with Applications to Computer Graphics and Machine Learning Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556

Computing the maximum bichromatic discrepancy is an interesting theoretical problem with important applications in computational learning theory, computational geometry and computer graphics. In this paper we give algorithms to compute the maximum bichromatic discrepancy for simple geometric ranges, including rectangles and halfspaces. In addition, we give extensions to other discrepancy problems.

متن کامل

Decision Trees Have Approximate Fingerprints Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556

We prove that decision trees exhibit the \approximate ngerprint" property, and therefore are not polynomially learnable using only equivalence queries. A slight modiication of the proof extends this result to several other representation classes of boolean concepts which have been studied in computational learning theory.

متن کامل

Probabilistic Analysis of Learning in Artiicial Neural Networks: the Pac Model and Its Variants Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556

1 1 A version of this is to appear as a chapter in The Computational and Learning Complexity of Neural Networks (ed. Ian Parberry), MIT Press. 2 Abstract There are a number of mathematical approaches to the study of learning and generalization in artiicial neural networks. Here we survey thèprobably approximately correct' (PAC) model of learning and some of its variants. These models, much-stud...

متن کامل

Neural Networks with Quadratic Vc Dimension Produced as Part of the Esprit Working Group in Neural and Computational Learning, Neurocolt 8556 Submitted to Workshop on Neural Information Processing, Nips'95

This paper shows that neural networks which use continuous activation functions have VC dimension at least as large as the square of the number of weights w. This result settles a long-standing open question, namely whether the well-known O(w log w) bound, known for hard-threshold nets, also held for more general sigmoidal nets. Implications for the number of samples needed for valid generaliza...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996